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Abstract 

We address the problem of providing inference for parameters selected 
after viewing the data from a Bayesian perspective. A frequentist solution 
to this problem is constructing False Coverage-statement Rate adjusted 
confidence intervals for the subset of selected parameters. We illustrate 
the limitations of the frequentist solution. We argue that if the param- 
eter is elicited a non-informative prior, or if it is a "fixed" effect that is 
generated before selection is applied, then it is necessary to adjust the 
Bayesian inference for selection. Our main contribution is a Bayesian 
framework for providing inference for selected parameters, based on the 
observation that from a Bayesian perspective providing inference for a se- 
lected parameter is a truncation problem. Our second contribution is the 
introduction of Bayesian FDR controlling methodology, that generalizes 
existing Bayesian FDR methods to the case of non-dichotomous param- 
eters. We illustrate our results by applying them to simulated data and 
data from a microarray experiment. 

1 Introduction 

The multiplicity problem is often identified in the statistical literature with the 
problem of selective and simultaneous inference. Benjamini and Yekutieli (2005) 
argue that the problem of selective inference and the simultaneity problem are 
two distinct problems encountered when trying to provide statistical inference 
for multiple parameters. Simultaneity refers to the need to provide inferences 
that apply to all the parameters, e.g. marginal confidence intervals that cover 
all the parameters with probability 0.95. A solution to this problem is Family 
Wise Error Rate adjusted inference. Selective inference refers to inference that is 
provided for parameters specified after viewing the data. The topic of this paper 
is Bayesian selective inference. We begin by describing a frequentist solution 
to the problem of selective inference, discussing selective inference in Genomic 
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association studies, and reviewing several aspects of Bayesian analysis that are 
relevant to our work. 

1.1 Control over the false coverage-statement rate 

Soric (1989) asserted that the goal of many scientific experiments is to discover 
non-zero effects, made the important observation that it is mainly the discov- 
eries that are reported and included into science, and warned that unless the 
proportion of false discoveries in the set of declared discoveries is kept small 
there is danger that a large part of science is untrue. 

Bcnjamini and Hochberg (1995), hereafter BH, considered the problem of 
testing to null hypotheses Hi ■ ■ ■ H m , of which mo are true null hypotheses. They 
referred to the rejection of a null hypothesis a discovery and the rejection of a 
true null hypothesis a false discovery. To limit the occurrence of false discoveries 
when testing multiple null hypotheses BH introduced the False Discovery Rate 
FDR = E{V/ max(_R, 1)}, where R is the number of discoveries and V is the 
number false discoveries, and introduced the BH multiple testing procedure that 
controls the FDR at a nominal level q. 

Bcnjamini and Yekutieli (2005) generalized the Benjamini and Hochberg 
testing framework. In their parameter selection framework there are to param- 
eters 9 1 ■ ■ ■ 8 m , with corresponding estimators 7i ■ • ■ T mi and the goal is to con- 
struct valid confidence intervals (CIs) for the subset of parameters selected by a 
given selection rule S(t\ ■ ■ ■ t m ) C {1 • • • m}. They showed that CIs constructed 
for selected parameters no longer ensure nominal coverage probability, and sug- 
gested the False Coverage-statement Rate (FCR) as the appropriate criterion to 
capture the error for CIs constructed for selected parameters. The FCR is also 
defined -E{V7 max(i?, 1)}, however R is the number of CIs constructed and V 
is the number of non-covering CIs. Benjamini and Yekutieli (2005) introduced 
a method of ensuring FCR < q for independent 7\ ■ • ■ T m and any selection 
criterion: construct marginal 1 — R ■ q/m CIs for each of the R selected param- 
eters. In cases where each 6i can be associated with a null value 6® and the 
selection criteria are multiple testing procedures that test 9i = Of vs. 9i ^ 0°, 
Benjamini and Yekutieli (2005) showed that the level q BH procedure can be 
expressed as the least conservative multiple testing procedure that ensures that 
all level q FCR adjusted CI for 0i, for which the null hypothesis is rejected, 
will not cover the respective 6®. Furthermore, they show that for independent 
T\ ■ ■ ■ T m if all 0i ^ 0° then applying the level q BH procedure to select the 
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parameters and declaring each selected Q% greater than 9® if Tj > 9® and smaller 
than 9® if Tj < 0° controls the directional FDR (expected proportion of selected 
parameters assigned the wrong sign) at level q/2. 

Example 1.1 Throughout the paper we use the following simulated example 
to illustrate the discussion. The simulation includes 10 5 iid samples of (9i,Yi). 
To generate 9i, we first sample Aj from {10, 1} with probabilities 0.90 and 0.10, 
and then draw 9i from the absolute valued exponential density, 7Ti(#j|Aj) = 
Xi ■ exp(— Aj • \9i\)/2; Yi = 9i + e i} with independent iV(0, 1). 

The selection rule is the level q = 0.2 BH procedure applied to the two sided 
p-values K = 2*{l-$(|Y i |)}, yielding R = 932 discoveries (p( 932 ) = 0.00 1 862 < 
0.001864 = 0.2-932/10 5 ) with \Y t \ > 3.111: 6 t is declared positive for Y, > 3.111 
and negative for Yi < —3.111. As all 9i ^ this ensures directional-FDR less 
than 0.1. The number of simulated positive selected 9i with negative Yi and 
negative selected 9i with positive Yi is 56, thus the realized directional-FDR is 
0.060. 

The 932 selected components are displayed in Figure 1. The abscissa of the 
plot corresponds to Yi, the ordinates are 9i. The red lines are two-sided Normal 
0.95 CIs: Y t ± Z 1 _ . 05/2 . The Normal 0.95 CIs cover 95,089 of the 100,000 
simulated 9i, but only 610 of the 932 selected 9i, thus the observed FCR is 
0.346. The green lines are 0.05 FCR-adjusted CIs: Yi ± Zi-o.o5-932/(2-io 5 )- The 
observed FCR for the FCR adjusted CIs is 0.046. 

1.2 Selective inference in Genomic association studies 

The need to correct inference for selection is widely recognized in Genome- 
wide association studies (GWAS). GWAS typically test association between a 
disease and hundreds of thousands of markers located throughout the human 
genome, often expressed as an odds ratio of manifesting the disease in carriers 
of a risk allele. Only multiplicity-adjusted significant findings are reported. 
This limits the occurrence of false positives, however it introduces bias into 
the odds ratio estimates. Analyzing 301 published studies covering 25 different 
reported associations, Lohmueller et al. (2003) found that for 24 associations 
the odds ratio in the first positive report exceeded the genetic effect estimated 
by meta-analysis of the remaining studies. Zollner and Pritchard (2007) suggest 
correcting for the selection bias by providing point estimates and CIs based on 
the likelihood conditional on having observed a significant association. Zhong 
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and Prentice (2008) further assume that in the absence of selection the log odds 
ratio estimator is Normally distributed. Similarly to our Bayesian analysis of the 
simulated example, they base their inference on a truncated normal conditional 
likelihood. 

1.3 Parameter selection in Bayesian analysis 

Berry and Hochberg (1999) comment that the Bayesian treatment of the multi- 
plicity problem also includes decision analysis, rather than just finding posterior 
distributions. 

Scott and Berger (2006) discuss Bayesian analysis of microarray data. The 
prior model for 0i , the expectation of the log- fold change in expression of Gene 
i, is that 6i — with probability p and Qi ~ N(0, V) with probability 1 — p. The 
decision analysis performed in Scott and Berger (2006) is the discovery of the 
subset of active genes. Scott and Berger (2006) declare a gene active (0, ^ 0) if 
the posterior expected loss of this action is smaller than the posterior expected 
loss of declaring the gene inactive {Qi = 0). Where the loss function for deciding 
that 6i = is proportional to \9i\, while the loss for erroneously deciding that 
Qi ^ is the fixed cost of doing a targeted experiment to verify that the gene is 
in fact active. 

In Bayesian FDR analysis of microarray data the decision analysis is also 
deciding which genes are active. However instead of specifying Bayes rules 
for selecting active genes that minimize the loss incurred by selecting inactive 
genes and failing to select active genes. In Efron et al. (2001), Qi is selected 
if the posterior probability given t/j that Qi = is less than a nominal value q. 
While Storey (2002, 2003) suggests using selection rules that ensure that the 
probability that Qi is falsely selected is less than q. 

1.4 Selection bias in Bayesian analysis 

Selection is considered to have no effect on Bayesian inference. Dawid (1994) 
explains "Since Bayesian posterior distributions are already fully conditioned 
on the data, the posterior distribution of any quantity is the same, whether it 
was chosen in advance or selected in the light of the data." Scnn (2008) re- 
views the disagreement between Bayesian and frequentist approaches regarding 
selection. He considers the example of providing inference for /Ltj» , the effect 
of the pharmaceutical associated with the largest sample mean yt* , among a 
class of m compounds with Yi ~ 7V(/ij,4). He first shows that if fn are iid 
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N(0,1) the posterior distribution of fi^ is N(yi*/5, 4/5). He then assumes a 
hierarchical model in which the treatments form a compound class. The class 
effect is A ~ N(0, 1 — j 2 ) and fii are iid N(X, r y 2 ). In this case he shows that the 
posterior distribution of /j,^ depends on the number of other compounds and 
their overall mean, however it is unaffected by the fact that \n* was selected 
because it corresponds to the largest sample mean. 

Mandel and Rinott (2009) discuss the example of a regulator trying to assess 
drug toxicity in a Phase I study, in which the Pharmaceutical company performs 
toxicity experiments on multiple drugs and drug doses but only discloses the 
results of successful experiments with few adverse events. Xi ~ Binom(n,pi) is 
the number of adverse events in each experiment. Thus the regulator bases his 
inference on {X^ , Xt 2 •••}, where Tj(x) — tj is the index of the j-th successful 
experiment. They show that if p\,p2,--- are independent then the Bayesian 
inference of the regulator, for a "safe" drug p tj , is the same as that of the 
company, and it is unaffected by selection. Whereas if p\ 1 p2, ■ ■ ■ are dependent, 
in particular if the company repeatedly tests the same drug until it is found safe, 
the Bayesian inference obtained by the regulator is affected by selection. It is 
different then the Bayesian inference obtained by the company, and different 
than the Bayesian inference he would obtain under the independence model. 

1.5 Fixed and random effects in Bayesian analysis 

In the Bayesian framework there can be no fixed effects since the parameters 
are regarded as having probability distributions. However, discussing one-way 
classification Box and Tiao (1973, Section 7.2) use the sampling theory termi- 
nology of fixed and random effects to distinguish between situations in which 
the individual means can be regarded as distinct values expected to bear no 
strong relationship one to another that can take take values anywhere within 
a wide range, and situations in which the individual means can be regarded 
as drawings from a distribution. Box and Tiao illustrate this distinction with 
the example of one-way classification of several groups of laboratory yields. In 
the first case the groups correspond to different methods of making a particu- 
lar chemical product, while in the second case the groups correspond different 
batches made by the same method. The distinction only carries through to the 
prior model elicited for the group means. In the first case the group means are 
elicited flat non-informative priors. They call this model the fixed effect model. 
In the second case the group means are iid N(X, a 2 ). This model is called the 
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random effect model. 

1.6 Preliminary definitions and outline of the paper 

Let 9 denote the parameter and Y denote the data, fl is the sample space of Y; 
n(9) is the prior distribution of 9 and f(y\9) is the likelihood function. We define 
selective inference as inference provided for a function of the parameter, h(9), 
that is given only if y £ Set is observed, for a given subset Sq C ^- For example, 
in our analysis of microarray data in Section 6 Y is the entire set of observed 
gene expression levels; 9 — (cr 2 ,/i) consists of the variances and expectations of 
the log-expression levels for all the genes in the array; and inference is provided 
for h{9) — fig, the expectation of the log-fold change in expression of Gene g, 
only if Gene g is declared differentially expressed, by the BH procedure or the 
Bayesian FDR controlling selection rules introduced in Section 4. 

Control over the FCR is a frequentist mechanism for providing selective in- 
ference. Notice that in Example If ,11 a random selected 9t is covered by its FCR- 
adjusted CI with probability > 0.95. But this frequentist selective inference 
mechanism suffers from several intrinsic limitations: it is impossible to incorpo- 
rate prior information on the parameters; it does not provide selection adjusted 
point estimates or selection-adjusted inference for functions of the parameters; 
the selection adjustment is the same regardless of the selection criterion applied 
and the value of the estimator. Figure I suggests that the selection adjustment 
needed is shrinking the CIs toward 0, rather then just widening the CIs, and 
that smaller selection adjustments are needed for 9i with large \Yi\. 

In selective inference the entire data set Y — y is observed. However, as 
inference is provided for h{9) only if y £ Sq, then Y = y used for providing 
selective inference for h(9) is actually a realization of the joint distribution of 
(9, Y), truncated by the event that y £ Sq. Thus in order to provide Bayesian 
selective inference for h(9) we define a framework for providing Bayesian in- 
ference based on the truncated distribution of {9,Y). We call this inference 
selection-adjusted Bayesian (saBayes) inference (describing Bayesian selective 
inference a truncation problem was suggested by Bradley Efron in private com- 
munication; for a discussion on truncation see Mandel (2007) and Gelman et al. 
(2004) Section 7.8). 

In Section 2, in order to define the components of saBayes inference: the 
selection-adjusted prior distribution ns(9), the selection-adjusted likelihood func- 
tion fs(Y\9) and the selection-adjusted posterior distribution n s (9\y). We study 
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the effect of truncation on the marginal distribution of 9 and the conditional 
distributions of Y\9 and 9\Y — y, in a generative model, in which 9 is sampled 
from n(9) and Y\9 is sampled from f(y\ 9). We specifically consider (9, Y) gen- 
erated by models that correspond to Box and Tiao's random effect model and 
fixed effect models. We also consider the case that ir(9) is a non informative 
prior, for which the generative model for 9 does not apply. 

In Section 3 we formally define saBayes inference, supporting our observation 
that saBayes inference should be used for providing Bayesian selective inference, 
by showing that the actions that minimize the selection-adjusted posterior ex- 
pected loss are Bayes rules in selective inference. We also define a Bayesian 
FCR for the random effect model and explain the relation between saBayes in- 
ference and providing FCR control. In Section 4 we define the Bayesian FDR 
as a special case of the Bayesian FCR; present methodology for specifying se- 
lection rules that control the FDR in the random effect model; explain how this 
methodology can be applied to control the FDR in cBayes analysis. In Section 
5 we show that the Bayesian FDR methods presented in Section 4 are gener- 
alizations of the existing Bayesian FDR methods and describe how to provide 
Bayes inference for selected parameters in the two group mixture model. 

In Section 6 we analyze microarray data for which the level 0.10 BH proce- 
dure applied to t statistic p-values fails to discover any differentially expressed 
genes. While applying the level 0.10 BH procedure to p- values corresponding to 
hybrid frequentist/eBayes moderated t-statistics does manage to discover 245 
differentially expressed genes, however it is not clear how to provide frequentist 
selective inference for these discoveries. We show that our level 0.05 Bayesian 
FDR selection rule based on the moderated t-statistic yields 1124 discoveries 
and that our level 0.05 Bayesian FDR selection rule based on the optimal statis- 
tic yields 1271 discoveries, and we provide Bayesian selective inference for the 
expected log2-fold change in expression of a specific differentially expressed gene. 

2 Modelling saBayes inference 

2.1 Fixed and random effects in Bayesian selective infer- 
ence 

The most important step in providing Bayesian selective inference is determining 
the way that selection acts on the parameter. A parameter is, intrinsically, either 
a "fixed" effect if it is generated before the data is generated and selection is 
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applied, a "random" effect if it is generated with the data and selection is applied 
to it, or a "mixed" effect if it is constructed of "fixed" and "random" effects. 
For example, in the microarray data analysis in Section 6 the parameters for 
Gene g = 1 ■ ■ • G are /i g the expected change in expression due to the Swirl 
mutation and a 2 the measurement error variance. Both \i = ■ ■ ■ [ic} and 
a 2 = \a\ ■ ■ ■ Oq\ are regarded as having probability distributions. Since the 
values of the components of a 2 are expected to vary according to the specific 
conditions of the experiment, er 2 is a "random" effect, while (x the vector of 
(unknown) biological constants is a "fixed" effect, and 9 g = (fig, a 2 ) is "mixed" 
effect. 

To define the components of saBaycs inference, we derive the truncated 
distribution of 9 and y in a generative model in which 9 ~ tt(0) and Y\9 ~ f(y\9), 
when 9 is either a "fixed", "random" or "mixed" effect. 

The "fixed" effect truncated sampling model. When 9 is a "fixed" effect, 
then first 9 is sampled from w(9) and then selection, given by the event S = 
Sn C 57, is applied to Y. The truncated conditional distribution of Y given 9 is 

fs(y\0) = Is a (y)-f(y\9)/Px(Sn\6). (1) 

But as selection is applied after 9 is generated it has no affect on the truncated 
marginal distribution of of 9 

w s (6) = tt(0), (2) 
thus the joint truncated distribution of (9, Y) is given by 

7T S (9) ■ f s (y\0). (3) 



The "random" effect truncated sampling model. In this case selection 
given by the event S = {(9, y) : y € Sn} is applied to (9, Y). The joint truncated 
distribution of (9, Y) is the conditional density of (9, Y) given S 

Is a (y)-*(0)-f(v\0) = Tr(g) ■ f(y\ 9) 

f s 7r(9)-f(y\9)d9dy Pr(S) ' [> 

Notice that in this case the joint truncated density of (9, Y) is proportional 
to the joint density of (6,Y), n(9) ■ f{y\9). Integrating over Y yields the 
marginal truncated distribution of 9 

f <0) ■ f(v\ 0) J 7T(0)-Pr(S n |fl) 
^ = L Pr(S) ^ = —RS) ' (5) 
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Dividing Q by (J5J) reveals that also in this case the truncated conditional dis- 
tribution of Y given 9 is fs(y\ 9) in ([T]) and that the joint truncated density of 
(9, Y) can be expressed as tts{9) ■ fs(y\ 0) . 



The "mixed" effect truncated sampling model. We consider the "mixed" 
9 truncated distribution of {9, Y) in a hierarchical generative model in which 
A ~ 12(A) is a "fixed" hyperparameter and 9\X is a "random" effect sam- 
pled rom Ki(6\ A). In this case A is sampled and then selection, given by 
S = {(9,y) : y £ Sq}, is applied to (9, Y). Thus the joint truncated density of 
(A,0,Y) is 

is a (v)-*2W-MQ\ = i Sn (y)-^W-MS\ A) ■ f(y\ g) , fil 

J s n 1 (9\X)-f(y\9)d9dy Pr(S| A) ' lJ 

Integrating out A in JBJ yields the joint truncated density of (9, y) 
j ( \ ft /• ^(A)-7r 1 (g| A) 

^ (») • • y — ptt^ta) — rfA ' ( 7 ) 

and integrating out y over Sn yields the marginal truncated distribution of 9 

Again, dividing ([7} by ([5} yields fs(y\ 9) in (TT]) and the joint truncated density 
of (9, Y) can be expressed ns{9) ■ fs(y\9) ■ 

2.2 Defining the components of saBayes inference 

We will now assume that tt(9) is the prior distribution and f(y\0) is the like- 
lihood function, and use the relation between the truncated and untruncated 
distribution of (9, Y) in the three generative models to define the components of 
saBayes inference. The selection-adjusted likelihood is defined fs(y\ 0) in JT]), 
the conditional distribution of Y\9 in the three truncated sampling models; the 
selection-adjusted prior for "fixed", "random" and "mixed" 9 is tts(0) for the 
corresponding truncated marginal distribution of 9 given in ^ , ([5]) and © ; the 
selection-adjusted posterior distribution is defined 

ns{v\ y) = r r (9) 

ms(y) 

for mg(y) = J irs(9) ■ fs(y\ 9)d9. Thus only for "random" 9 the selection- 
adjusted posterior distribution is unaffected by selection and it is equal to the 
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posterior distribution n(9\ y). 



Remark 2.1 Note that even though we defined the selection- adjusted posterior 
distribution, for 'fixed" , "random" and "mixed" 9, according to the conditional 
distribution of 9 given selection and Y = y. Dawid's argument, that selection 
has no effect on posterior distributions since conditioning on the selection event 
is made redundant by conditioning on Y = y, only applies for "random" 9, in 
which the selection event S is a subset of the sample space of (8, Y). Whereas 
for "fixed" and "mixed" 9, for which selection is not applied to (9,Y), irs{9\ y) 
is different than ir(9\ y). 



Example 2.2 m ~ N(0, 1-7 2 ) is a "fixed" effect, [i 2 ~ N(0,7 2 ) is a "random" 
effect, and Y ~ N(p, 2 — Mi,l)- Thus for < j 2 < 1 and 9 = (12 — Mi, the 
marginal density of 9 is ir(9) — <j>(0) and the conditional density of Y\9 is 
f(y\ 9) = 4>(y — 9). To illustrate the difference between the selection adjusted 
posterior distributions for "random", "fixed" and "mixed" effects we compute 
the selection adjusted posterior mean of 9 for the selection rule Sn = {y : y > 0}, 
for 7 2 = 1, and 0.5. 

For 7 2 = 1, 9 = fi 2 is a "random" effect whose selection-adjusted posterior 
distribution, given by 

, . . _e£. (f>-y) 2 (e-y/i) 2 

-Ks(9\y) oc e 2 . e 2 K e 2 < 1 /2) ; 

is N(y/2, 1/2) for any selection criteria. Thus E(9\ y = 1) = 0.5. 

For 7 2 = 0, 9 = iii is a "fixed" effect. The selection-adjusted posterior 
distribution is given by 

TT S {9\y) cx e-^ ■ e- {J ^ /Vt:{Y >0\8). 

As Pr(F > 0| 9) decreases in 9, the selection-adjustment stochastically decreases 
the posterior distribution distribution of 9, and thus E(9\ y = 1) = 0.10. 

7 2 = 0.5 yields a "mixed" effect truncated sampling model: \i\ ~ A^(0, 1/2) 
is the "fixed" hyperparameter, 9\[i x is the N(ni, 1/2) "random" effect, and the 
selection-adjusted posterior distribution is given by 

(O-I/) 2 f fl (f-Ml) 2 

n s {9\y) cx e ^~ / e 2 d/2) e 2.(1/2) / p r (F > 0| /xi)rf/xi. 

As y|/xi is N(ni,3/2) in this case the selection-adjustment is weaker, thus 
E{9\ y = l)= 0.33. 
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Example 2.3 Notice that Sean's example of providing inference for the most 
active compound is a selective inference problem in which the parameter is the 
vector of effects of the m pharmaceuticals /i = {/ii • • • /x m }; (J-i ■ ■ ■ Hm are iid 
7V(A, 7 2 ) for A - N(0,1- 7 2 ); the data is Y = {Y 1 ---Y m } with Y t ~ N(m,4); 
and inference is provided for h(fi) = fii only if Sq = {y : yi = maxj—i,,, m yj} 
occurs. Senn (2008) concludes that selection has no affect on the posterior 
distribution of because in his analysis /i is a "random" effect. To show 
that Bayesian inference may be affected by selection, we compute the selection- 
adjusted posterior mean of h{n) = fi2 for m = 2 and y — (0,2), for "mixed" 
and "fixed" fi. 

To define the "mixed" /x, we assume that A is a "fixed" effect and //|A is 
a "random" effect. However, since in this example Pr(Sn|A) = Pr(S'n) = 0.5, 
then the "mixed" effect model truncated joint density defined in ([7]) reduces 
to the "random" effect joint density in (QJ. Thus in this case the conditional 
distribution of i< 2 is unaffected by selection. We use Expression (4) in Senn 
(2008) to compute the conditional mean of (9 2 for the case of "random" and 
"mixed" /i. For 7 2 = 1 it equals 0.4 and for j 2 = 0.5 it equals 0.384. 

The selection-adjusted joint density of \x for "fixed" A and /1 is given by 

e 2 ~i ■ e 2.(1—7") ■ e 2 -( 1 -y ) • e 2 4 • e 24 

7TS(/^1,M2| V = (0,2)) OC 



Pr(F 2 > Yi|/ii,M2) 

In this case the selection adjustment increases the posterior distribution of [i 
values with /i 2 < /ii, thereby stochastically decreasing the marginal posterior 
distribution of /i 2 . For j 2 = 1 the conditional mean of 6* 2 is 0.164 and for 
7 2 = 0.5 it is 0.257. 



2.3 saBayes inference in the random effect model 

Using the terminology suggested by Box and Tiao, we call the model for 6 — 
(^1 ■ • • 6 m ) and Y = {Yi ■ ■ ■ Y m }, that 0i are iid 7r(0j) and Yi\9i are independent 
f{Vi\^i)i a random effect model. 

In the random effect model 9 can be a "random" effect, a "fixed" effect, 
and even a "mixed" effect when there are iid "fixed" A 2 ; for which 6i\\ are 
independent "random" effect. In any case the joint distribution of (9, Y) is 

n(6) ■ f(y\9) - D^tt^) • 11^/0/^). (10) 
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In selective inference for h(9) = 9i with Sq — {y : G S mar g}, incorporating 
(fTU|) into (J3|) and integrating over 0W = {0i, . . . 0j+i, • • • 9 m }, yields the 
selection adjusted joint distribution of (0j, Y) for "fixed" 9 

isn(y)-m-f(y\Q) de(l) = ij^M^MJMM fin 

Pr(Sn\9) " Vv{S marg \9i) ' K ' 

While incorporating (fT0|) into ((4]) and integrating over 0W, reveals that the 
selection adjusted joint distribution of (0i, Y) for "random" 9 is 

PwW \ ' { ' 

and incorporating (fTU]) into (J7J and integrating over 0W, the selection adjusted 
joint distribution of (0j,li) for "mixed" is 

2.3.1 The non-exchangeable random effect model 

The non-exchangeable random effect model is a generalization of the random 
effect model for situations in which Oi are distinct values expected to bear no 
strong relationship one to another, i.e. situations for which the non-informative 
prior suggested by Box and Tiao is the fixed effect model. In the non-exchangcablc 
random effect model 9i are independent ■n 1 (9i) and Yj 1 9i are independent / (yi \9i). 
Thus the joint distribution of (9, Y) is 

tt(0) • f(y\9) = B™ ^(0,) ■ TJ&JivilOi). (14) 

The marginal distribution of (0j, Yi) is 

But in selective inference for h(9) = 9i with Sq — {y : G S ma rg\, the selection 
adjusted joint distribution of (9i,Yi) for "fixed" 9 is 

Is marg (y l )-7r l (9 t )-f(y l \9 l ) 
Pr(5 

Example 2.4 Notice that (0, Y) in Example 11.11 were generated by the random 
effect model that 9\ ■ ■ ■ #100,000 are independently drawn from 

7f(0i) = 0.9 • 7Ti(0|A = 10) + 0.1 • 7Ti(0|A = 1) (16) 



12 



and Yi\Oi are independently drawn from f(yi\0i) — 4>(yi — Figure 1 is a scat- 
ter plot of 932 (0i,yi) with > 3.111; Figure 4 displays the 470 components 
with yi > 3.111. For comparison in the comparable non-exchangeable random 
effect model: for i = 1 ■ • • 90000, 0; - tti(0|A = 10) and for i = 90001 • • • 100000, 

^~7Tl(0|A=l). 

It is important to note that defining 9 as either a "random" , "fixed" or 
"mixed" effect changes the truncated distribution of (9,Y), however it has no 
effect on the distribution of (0,Y) sampled Examplc ll.il To observe the differ- 
ence between the truncated distributions we sampled 1000 realizations of (9, Y) 
from each truncated distribution for h{9) = 9\ with Sn — {y : \yi\ > 3.111}. 
Figure 2 displays scatter plots of {6i,Y\) from the realizations of (6,y) with 
yi > 3.111. The left panel is the scatter plot for the "random" 9 model. In this 
case the joint density of (6±,Yi), given in (TT2")) . is identical to the joint density 
of (9i,Yi) displayed in Figures 1 and 4. The right panel is the scatter plot for 
the "fixed" 9 model with joint density given in (jlip . In this model ns(9i\yi), 
the selection-adjusted marginal posterior distribution of 6\ , is shrunk towards 
0. For the "mixed" 9 model, Xi are iid "fixed" effects sampled from {10, 1} with 
probabilities 0.90 and 0.10 and 6i\\ are independent "random" effects with 
conditional density ni(9i\ Xi). Thus the joint density of (6\,Y{) given in (fl3|) is 

O.Q-MfljlA^lO) . 0.1 ■ ^1(^-1 Ax ^1) 
nvi 1) t Pr (| Fl | > 3 m J x . = 10) f Pr (| Fl | > 31U J x . = 

In this model the shrinking of ~irs{@i\yi) towards is weaker than in the "fixed" 
9 model. 



2.4 saBayes inference for non-informative priors 

Our generative model results, regarding the effect of selection on the marginal 
distribution of 9, do not apply when ir(0) is a non-informative prior. Non- 
informative prior distributions are used to allow conditional analysis on 9 when 
no prior information on 9 is available (Berger 1985, Section 3.3.1). As Y also 
provides all the information on 9 in the truncated data problem, we argue that 
ir s (6) the prior distribution used for saBayes inference should also be a non- 
informative prior. We further argue that the lack of prior knowledge on 9 may 
affect our decision to provide selective inference, but the opposite is not true 
- the decision to provide inference only for certain values of Y should have no 
effect on the non-informative prior elicited for 9. We therefore propose setting 
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tt s (9) = ir(9), thus the selection-adjusted posterior distribution is given by 

7r s (9\y) cx n(0)-f s (y\9). (17) 

Which means that if 9 is elicited a non-informative prior then it is treated as a 
"fixed" effect. 

3 Selection-adjusted Bayesian inference 

To formally define saBayes inference, we assume that the inference involves an 
action 8(Y) associated with a loss function L(h(9),S). As selective inference 
is provided only for selected (6,Y), rs(S) the expected loss incurred by S(Y) 
in selective inference, which we call the saBayes risk, can be expressed as the 
Bayes risk for the truncated distribution of (9, Y) 

r s (S) = f f L(h(9),6(y))-TT S (9)-fs(y\9)dyd9 
Jeee Jyes n 

= [ [[ L(h(9),5(y))-7T S (9\ y )d9}-m s (y)d y . (18) 
Jy&Sn Jeee 

Thus the Bayes rules in selective inference are the actions minimizing the selection- 
adjusted posterior expected loss 

Ps (S,y) = J L(h(9),6(y))-ns(0\y)dd, 

and in general Bayesian selective inference should be based on ns(h(6)\ y), the 
selection-adjusted posterior distribution of h(9). Thus selection-adjusted 1 — a 
credible intervals for h(9) are subsets A for which Pr wg (^(e)i y )(h(9) eA) = 1— a, 
and the posterior mean or mode of ns{h(9)\ y) can serve as selection-adjusted 
point estimators for h{9). 

Example 3.1 We provide saBayes inference for the data simulated in Example 
O for h(9) = 6*12647 with S n = {y ■ I2/12647I > 3.111}, and for h(9) = 6> 905 43 
with Sn — {y : I2/90543I > 3.111}. We use two prior models for 9 in our analysis. 
In the first model 9 is a "random" effect generated by the random effect model, 
with 7r(#i) in (|T6l) . In this model the saBayes posterior distribution of 9i is 
proportional to the distribution of (9i,Y{) in p^|) 

n s {9 l \y i ) cc *{9i) > 4>{ yj - 9$. (19) 
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In the second model 9 is generated by the non-exchangeable random effect 
model with unknown -K l {9i). Thus following Box and Tiao we use the flat non- 
informative prior 7r l (0j) = 1 in our analysis. The flat prior unadjusted posterior 
distribution of 9i is 



Whereas the non-informative prior saBayes posterior distribution of #; is pro- 
portional to the distribution of (9i,Yi) for "fixed" 9 in (|TT|) 



with Pr(S n \ 9j) = $(-3,111 - 9j) + 1 - $(3,111 - 9j). 

Figure 3 displays the posterior distributions of #12647 (left panel) and 6*90543 
(right panel). The flat prior unadjusted posterior mean and mode of #12647 equal 
^12647 = 3.40, the 0.95 credible interval is [1.44,5.36]. The selection adjustment 
shrinks the posterior distributions of #12647 towards 0. The "random" 9 saBayes 
posterior distribution of #12647 is bimodal with a spike at and a mode at 2.40, 
the posterior mean is 1.68, the 0.95 credible interval is [—0.11,4.20]. The flat 
prior saBayes posterior mode of #12647 is 0.74, the posterior mean is 1.88, and 
the 0.95 credible interval is [-0.04,4.64]. 

The flat prior unadjusted posterior mean and mode of #90543 equal I90543 = 
5.59, the 0.95 credible interval is [3.63,7.55]. The much larger Y90543 produces 
a non-negligible likelihood only for 9i values that correspond to almost certain 
selection. Thus in this case the selection adjustment is small: the flat prior 
saBayes posterior mode is 5.57, the posterior mean is 5.48, and the 0.95 credible 
interval is [3.26, 7.52]. The shrinking towards in the "random" # model poste- 
rior is stronger: the posterior mean and mode is 4.59 the 0.95 credible interval 
is [2.62,6.55]. 

Remark 3.2 It is important to note that as extremely unlikely values of # with 
an extremely small selection probability can have a large selection-adjusted like- 
lihood, the selection adjustment posterior distribution can be be very different 
than the unadjusted posterior distribution. The selection-adjusted likelihood 
can even be non-informative and improper - if the selection rule only includes 
the observed value Y = y then the selection-adjusted likelihood is constant for 
all parameter values. Example 13.31 illustrates this phenomenon, shows how it is 
affected by the choice of the selection rule and that it is not unique to Bayesian 



7r(#i I Vi) oc <j){y l - 9i). 



(20) 



TrsiOjlyj) ex <j ) {y J -9 J )/Pr{Sn\9 j ), 



(21) 
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selective inference. In this paper we employ selection rules whose selection prob- 
ability is minimized at 9 = and approaches 1 for large |#|, thus the selection 
adjustments shrink the likelihood towards 0. 

Example 3.3 To illustrate the potential non-robustness of the selection ad- 
justment we derive the non-informative prior saBayes posterior distribution 
of #12647, given in (j21~j) . for an alternative one-sided selection rule Sq — {y : 
2/12647 > 3.111}. In this case the selection- adjusted posterior is stochastically 
smaller and much more diffused. The selection-adjusted posterior mode is 0.19 
and the selection-adjusted posterior mean is —2.87; the 0.95 selection-adjusted 
credible interval is [—15.41,3.91]; and an unlikely value #12647 = —5.87, with 
unadjusted likelihood 0(-5.87 - 3.40) = 8.73 x 10~ 20 and selection probabil- 
ity $(—5.87 — 3.111) = 1.34 x 10~ 19 , has the same selection-adjusted poste- 
rior density as the unadjusted posterior mode #12647 = 3.40, i.e. irs(0i2647 = 

-5.87| F12647 = 3.40) = 7r S (#l2647 - 3.40| Fl2647 = 3.40). 

This non-robustness is not unique to Bayesian selective inference. To con- 
struct selection-adjusted frequentist 0.95 confidence intervals for 012647 we be- 
gin by testing, at level 0.05 and for each value of #0, the null hypothesis that 
#12647 = #o- The sampling distribution of F i2 647 1 #12647 = #0 is Js(yi2647|#o) 
in ([lj for #12647 = #o- Thus we reject the null hypothesis that #12647 = #0 if 
2/12647 is smaller than the 0.025 quantile or larger than the 0.975 quantile of 
/s(?/i2647|#o)j an d the 0.95 confidence interval for #12647 is the set of #0 values 
for which the null hypothesis that #12647 = #0 is not rejected for j/12647 = 3.40. 
For the selection rule Sq = {y : |yi2647| > 3.111} the 0.95 confidence interval for 
#12647 is [-0.37,5.03]. While for Sq = {y: 2/12547 > 3.111} the 0.95 confidence 
interval for #i 2 647 is [-9.44,5.03]. 

3.1 FCR control in the random effect model 

We define the FCR for (#, Y) generated by the random effect model. The initial 
set of parameters is #1 • • ■ 9 m . The subset of selected parameters is {#, : yi £ 
Smarg}, and a marginal confidence interval A marg (yi) is constructed for each 
selected #;. For % = 1 • • -m, let R { = I(Yi £ S marg ) and Vi = I(Y l £ S marg , #, £ 
Amarg^Xi))- The indicators Ri and Vi are defined for the joint (untruncated) 
distribution of (#, Y). Thus regardless of whether # is "random" or "fixed" the 
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conditional density of (6*.;, Yi) given Ri = 1 is 



J^Y^Sjnarg ) 



(22) 



In a single realization of (9,Y), R — ^ s the number of selected pa- 

rameters, V — Vi is the number of non-covering confidence intervals, and 
FCP = V/max(l, R) is the false coverage-statement proportion. In Benjamini 
and Yekutieli (2005) FCR refers to a frequentist FCR, that corresponds to 
Ey\ qFCP for (6,Y) generated by a random effect model. In this paper FCR 
will refer to a Bayesian FCR, defined EgyFCP . We also consider the positive 
FCR, pFCR = E e , Y (FCP\ R>0). 

To explain the relation between the FCR incurred in parameter selection 
in the random effect model and saBayes inference, we consider (9, Y) with the 
same distribution as (9, Y), but with 9 being a "random" effect. In parameter 
selection the identity of the selected genes is determined according to y\ ■ ■ ■ y m . 
Therefore constructing a marginal confidence interval for 9i when it is selected 
can be expressed as providing selective inference for h(9) = 9i, with Sq = {y : 
yi G Smarg} and with 9 viewed as a "random" effect. Which explains why (|22l) 
is equal to the "random" effect selection-adjusted distribution of (6i,Yi), given 
in p'2p. and also implies that the conditional density of 0j given i?j = 1 and 
Yi = yi is equal to the "random" 9 selection-adjusted posterior 

n s {9 t \ Vl ) cx 7r(0i) ■ f(yi\9i). (23) 

As 9 is per construction a "random" effect, the conditional probability given 
Ri = 1 and Yi — yi that 9i £ A marg (yi) can be expressed as the selection- 
adjusted posterior expected loss in selective inference for h{9) = 9{ with Sn = 
{y-Vi^ Smarg} for the loss function L(0 i; Ai(y)) = I(9 l £ A marg (yi)) 



p(y) = J I(9i i A marg (yi)) ■ n(9 t \ y l )d9 i , 

and the conditional probability given that Ri = 1 that ^ A marg (yi) is the 
corresponding saBayes risk 

rs = E ms{il) p{yi). (24) 

Proposition 3.4 The pFCR in the random effect model is equal to the "ran- 
dom" 9 saBayes risk r$- I n particular, if A marg (yi) are 1 — a credible intervals 
for 9i based on 7rs(6i\yi) in A23\) then pFCR = a. 
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Proof. In the random effect model Ri are independent and {Vt : Ri — 1} are 
mutually independent with Pr(T^ = l\Ri = 1) = r$- Thus for each value of R = 
k, V ~ Binom(k, rs), and conditioning on R > yields pFCR = rs- Lastly, for 
1 — a selection-adjusted credible intervals based on Trs(6i\yi), fg = p(yi) = a. % 

Remark 3.5 We have shown that in the random effect model, regardless of 
whether 9 is "random", "fixed" or "mixed", the pFCR equals the "random" 
9 saBayes risk. As pFCR > Bayesian-FCR the "random" 9 saBayes risk can 
serve as a conservative estimate for Bayesian-FCR. In particular, for large R the 
sampling dispersion of FCP and of V/ER is small, thus the FCP, Bayesian-FCR, 
frequentist-FCR, pFCR and also EV/ER, we discuss in the context specifying 
selection rules in the non-exchangeable random effect model, are almost the 
same. 

Remark 3.6 Recall that if ir(9i) is a noninformative prior then the selection 
adjusted posterior distribution for "random" 9 is actually the "fixed" 9 selection 
adjusted posterior 

Trsmvi) <x 7r(0,) • /(2/i|0i)/Pr(S mar9 |0i)- (25) 

As credible intervals based on non-informative priors are expected to provide 
approximate coverage probability, when ir(9i) is a non- informative prior then 1— 
a credible intervals based on ns(9i\yi) in (|2"5|) yield p(yi) ~ a. Thus Proposition 
implies that for non informative priors the "fixed" 9 marginal 1 — a credible 
intervals yield approximate level a FCR control. 

Example 3.7 Figure 4 displays (9i,yi) generated in Example 11.11 with yi > 
3.111. The red and green dashed curves are the 0.95 confidence intervals from 
Figure 1. The red curves also correspond to the 0.95 credible intervals for Oi for 
the flat prior unadjusted posterior (I2U1) . The blue curves are the 0.95 saBayes 
credible intervals for the fiat prior selection-adjusted posterior in ([2~T|) . and the 
light blue curves are the 0.95 saBayes credible intervals for the "random" 9 
selection-adjusted posterior in ffl9|) . 

According to Proposition ^. 4l the pFCR for "random" 9 0.95 saBayes credible 
intervals constructed for selected is 0.05. In the simulation the FCP for 

the 932 selected 9% was 0.047. As the flat prior unadjusted credible intervals 
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are 0.95 frequentist confidence intervals, we expect the coverage proportion for 
all 100, 000 9i to be close to 0.95. In Example 1 1.1 1 we have seen that these CIs 
cover 95,089 of the 100,000 0j. From a Bayesian perspective these are equal 
tail credible intervals based on minimally-informative prior known to provide 
good frequentist performance (Carlin and Louis, 1996, Section 4.3). We have 
also seen that the FCP for the 932 selected parameters is 0.346. Benjamini 
and Yekutieli (2005) explain this phenomenon from a frequentist perspective. 
Remark 13.61 offers a Bayesian explanation: in order to provide approximate 
FCR control for non informative priors the credible intervals should be based 
on the "fixed" 9 selection adjusted posterior in (|20p . rather than the "random" 
9 selection adjusted posterior in (IT§1) . And indeed, the FCP of the credible 
intervals based on (l20l) was 0.040. 

4 Specifying FDR controlling selection rules in 
the random effect model 

In this section we present methods for specifying selection rules in cases where 
the primary goal of the experiment is making statistical discoveries. As in 
Section 13.11 we assume that (9, Y) are generated by the random effect model; 
0i is selected if yi 6 S marg ; and the inference provided for 9i if it is selected 
is declaring that it is in A marg (yi). However now A marg (yi) is an event that 
corresponds to making a statistical discovery regarding 9i. In Senn's example of 
providing inference for the most active compound, the statistical discovery that 
corresponds to selecting 9i is declaring that 9i > maxj^i9j. While in Genome- 
wide association studies the selected parameters are odds ratio between diseases 
and genetic markers that are found to be either greater than 1 or smaller than 
1. 

Once declaring 9i £ A marg (\ji) corresponds to making a statistical discovery, 
R becomes the number of discoveries, V becomes the number of false discoveries, 
V/max(l, R) = FDP is the false discovery proportion, and FCR = FDR. Thus 
Proposition 13.41 yields the following result. 

Corollary 4.1 In the random effect model the pFDR is equal to the saBayes 
risk for "random" 9 in selective inference for hi{9) = with Sq = {y : yi S 
} and loss function L(9i, A marg ) = I(9i A marg (yi)). 
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Thus to ensure level a FDR control, when considering S marg — {yi : < s}, 

we suggest choosing s for which the "random" 9 saBayes risk is q. 

Furthermore since for "random" 9, for which posterior distributions are un- 
affected by selection, the posterior expected loss is 



p(Vi) = J Wi i AnargiVi)) ' 4^1 Vi)Mi, 

and the truncated marginal distribution of yi is 

G S marg ) ■ J Tr(9i)f(yi\9i)d9i 



m s (yi 



J I(yi G S marg ) ■ J n(0i)f(yi\9i)d9idyi 
For any S marg the saBayes risk in (Tl8l) can be expressed 

J I{yi G S marg ) ■ p(yi) ■ J TT{9i)f{yi\9i)d9idyi 



J I G S marg ) • / w{Qi) J ' (jji\9i)d9 t dy t 
J I(yi G S marg ) • p(iji) • rh(yi)dyi 



) ■ fh(yi)dyi 



(26) 



for m(yi) = J ir(di)f(yi\ 6i)d6i. Thus as the denominator in (f26|) is the proba- 
bility that 9i is selected, Corollary 14 . 1 1 and Expression ([26)) for the "random" 9 
the saBayes risk yield the following Neyman-Pearson Lemma type result. 

Corollary 4.2 S marg — {yi : p{y{) < s} has the largest selection probability of 
all selection rules with the same pFDR. 

Another option is to use p{yi) to directly specify the selection rule, by defining 

Smarg = {Vi ■ PiVi) < ?}• (27) 

Notice that unlike the continuum of possible credible intervals that can be 
constructed for 9i , the number of possible discoveries that can be made regarding 
9i is finite. In particular, when there is only a single possible discovery for all 
values of j/,, i.e. A marg (yi) = A ma r g , then expressing the "random" 9 saBayes 
risk corresponding to this discovery 

rs =ii I{9t f A marg ) ■ —dy 2 d9 t 

m i Amar g ) ■ Wf^f*) ^ 

I(9i i A marg )-ir S (9i)d9i, (28) 
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for ns{9i) — n(8i) •Pr(S , marff [ 9i) / Pr(S marg ) the "random" 9 selection-adjusted 
prior density derived in ([5]), yields the following result. 

Corollary 4.3 If A marg (yi) = A marg then the pFDR is equal to the "random" 
9 selection- adjusted prior probability that 9i A marg . 

4.1 Specifying FDR controlling selection rules in the non- 
exchangeable random effect model 

In this subsection, (9, Y) is generated by the non-exchangeable random effect 
model, 9i is selected if yi € S margi and the inference provided for selected 9i is 
the discovery that 9i £ A marg (yi). Let A x marg ■ ■ ■ A^ narg denote the D possible 
discoveries that can be made on 9i. For d = 1 • • • D, let R d denote the number of 
discoveries of Am arg an d let V d denote the number of false discoveries of A^^g. 
The results in this section are derived under the assumption that A marg (yi) = 
A m arg- However as ER = ER 1 + ■■■ + ER D and EV = EV 1 + ■■■ + EV D , 
they can be easily extended for the case of D > 1. To derive the results in this 
section we consider (9,Y) as before, but with marginal prior density 7r(#i) = 



Lemma 4.4 For any subset B, Wi = I(yi E S marg ,9i B) and Wi = I(yi £ 



S, 



marg j 



Oi i B) 



m m 



eYw^eYw,. 
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For B = 0, Y^iLi Wi is the number of discoveries R, while for B = A marg , 
Y^iLi Wi is the number of false discoveries. Therefore Lemma T4.4I implies that 
EV, ER, thus also EV/ER, for (9,Y) and for (6,Y) are the same. According 
to Corollary S3] for (§,Y) the pFDR equals f s . Thus as FDR < pFDR, and 
FDR r; EV/ER is approximately the same for (6*, F) and for (6*, Y), we get the 
following result. 

Corollary 4.5 In i/ie non-exchangeable random effect model selecting 6i ij yi G 
Smarg yields approximate level fs FDR control. 

To define a general method for specifying FDR controlling selection rules for 
(9, Y) generated by the non-exchangeable random effect model with unknown 
marginal priors, notice that applying empirical Bayes methods to yi ■ ■ ■ y m 
actually estimates fr(9i), the mixture of the (unknown) marginal densities of 
6\ ■ ■ ■ 9 m . Combining this with Corollary 14.51 implies that the FDR of any se- 
lection rule can be approximated by fs computed by treating (9, Y) as if it 
was generated by the random effect model and using eBayes estimate of ir(9i). 
Furthermore, as ER — ER and ER = m ■ Pr(y, G Smarg), then also in the non- 
exchangeable random effect model the selection rule S marg = {yi : p(yi) < s}, 
yields the maximal ER of all S m arg with the same fs- 

Definition 4.6 Algorithm for specifying level q FDR controlling selection rules 
in the non-exchangeable random effect model: 

1. Apply eBayes to yi ■ ■ ■ y m to produce Tr(9i). 

2. Use Tr(9i) to compute fs for any given selection rule. 

3a. To specify a level q FDR controlling selection rule of the form S marg — 
{y : T(yi) < s}, for a given statistic T(yi), find s for which fs = q. 

3b. The level q FDR controlling selection rule yielding the maximal expected 
number of discoveries is S marg — {y : p{y%) < s} with s for which fs = q. 



Example 4.7 In Example 11.11 selection is associated with D = 2 directional 
discoveries. According to Corollary |4. li the pFDR for the selection rule \y t \ > s is 
equal to the "random" 9 saBayes risk for the loss function I(sign(9i) ^ sign(yi)) 

E ms(y) { I(y < -a) ■ Pr (6 > 0) + I(y > a) ■ Pr (6 < 0) }. (29) 
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Recall that \yi\ > 3.111 was used to ensure that the directional-FDR is less 
than 0.1. For s = 3.111 the saBaycs risk (|2T)1) is 0.070, whereas setting s = 2.915 
yields the selection criterion for which the saBayes risk is 0.10. The posterior 
expected loss corresponding to the directional-FDR is 

p(Vi) = P, r (sign(9i) ^ sign{yi)). 

tt(0| y) 

Notice that in this example p(yi) increases in \yi\, thus \yi\ > 2.915 is the 
fs = 0.10 selection rule yielding the maximal expected number of discoveries. 
For yi > 0, p(yi) is the conditional probability given y^ that 6i < 0. p(0) = 0.5, 
,5(3.111) = 0.176, and ,5(3.472) = 0.10. Thus \m\ > 3.472 is the selection 
criterion suggested in (|2"7|) for q = 0.10. 

To illustrate the results on the non-exchangeable random effect model, we 
evaluated EV, ER and the directional-FDR in n = 10 5 replications of the ran- 
dom effect model simulated in Example 11.11 and the comparable non-exchangeable 
random effect model described in Example l2.4l In both models the mean number 
of discoveries was 919.9 (s.e. < 0.07), the mean number of false discoveries was 
64.4 (s.e. < 0.03), and the mean directional-FDP was 0.070 (s.e. < 0.00003). 



5 The relation between saBayes inference and 
Bayesian FDR methods 

The term Bayesian FDR methods refers to the multiple testing procedures pre- 
sented in Efron et al. (2001) and Storey (2002, 2003) for the following two 
group mixture model. Hi, i = 1 • • • m, are iid Bernoulli^! — tto) random vari- 
ables. Hi = corresponds to a true null hypothesis, while Hi = 1 corresponds 
to a false null hypothesis. Given Hi = j, Yi is independently drawn from fj, for 
3 = 0, 1- 

The positive FDR (pFDR) corresponds to a rejection region T. It is defined 
E(V/R\ R > 0) where R is the number of j/j 6 T, and V is the number of iji G T 
with Hi = 0. Storey proves that 

pFDR(Y) = Pr{Hi = Q% 6 V) (30) 

7T ■ PrjY, g T\H t = 0) 

TT -Pr(Yi €T\Y i = 0) + (l-ir )-Pr(Y i er|^ = l)' 1 ' 

with Pr(Yi G T\H^ = j) = J Bigr fj(yi)dyi- For the multiple testing procedure 
each null hypothesis is associated with a rejection region Fj, determined by yi) 
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the pFDR corresponding to r.;, called the q- value, is computed; and the null 
hypothesis Hi = is rejected if q- value < q. The local FDR is defined in Efron 
et al. (2001) as the conditional probability given Yi — yi that Hi = 

fdriy) = ™° • foiVi) 

to • fo(yi) + (1 -7ro) • fi(yi)' 

The multiple testing procedure based on the local FDR is reject Hi = if 

fdr(Xi) < g. 

Notice that Bayesian FDR methods can be expressed as a special case of 
the FDR controlling selection rules presented in the previous section, in which 
the components of the parameter vector are dichotomous. The parameter is 
H = (Hi---H m ), and (H,Y) are generated by a random effect model: the 
marginal distribution of Hi is n(Hi = j) = (1 — ttq) 1 ■ "ttq 1 ^ , fj is the likelihood, 
Hi is selected if 6 r and selection is associated with declaring Hi = 1 . Notice 
also that Expression (f3"Tj) is a special case of Expression (|28l) : it is the "random" 
effect saBayes risk for the loss function I (Hi = 0), expressed as the selection- 
adjusted prior distribution of making a a false discovery 

7r r (i?i = 0) oc 7r(fli = 0) • Pr(F, e T\ H, = 0). 

Thus the equality in ([50)1 proven by Storey is a special case of Corollary 14.31 
The local FDR is the "random" 9 selection-adjusted posterior expected loss, 
thus the multiple testing procedure based on the local FDR is a special case of 
the selection rule in (|27|) . Lastly, the relation between the local FDR and the 
pFDR, pFDR = E ye rfdr(y), follows from the definition of the saBayes risk in 

Bayesian FDR methods are valid regardless of whether H is a "random" 
or "fixed" effect. However in selective inference for h(H) = Hi, the selection- 
adjusted posterior probability that Hi = for a "random" H is equal to the 
local fdr. Whereas if H is a "fixed" effect, or if ttq is the non-informative prior 
probability that Hi = 0, then the selection-adjusted posterior distribution that 
Hi = is 

*o-Myi\Hi = 0) 

• fr(yi\Hi = 0) + (1 - ttq) • fr(yi\Hi - 1) ' 
for fr(Ui\ Hi = j) — /i(j/j)/Pr(yj € T| Hi = j) the selection-adjusted likelihood. 

6 Analysis of microarray data 

We analyze the Dudoit and Yang (2003) swirl data set. The data includes 4, 
8448 gene arrays, comparing RNA from Zebrafish with the swirl mutation to 
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RNA from wild-type fish. For Gene g, g = 1 • • • 8448, the parameters are \x g the 
expected log2-fold change in expression due to the swirl mutation, and a 2 the 
variance of the log2-fold change in expression. 

In our analysis we assume that (9, Y) are generated by a non-exchangeable 
random effect model. a 2 are iid "random" effects with scaled inverse chi-square 
marginal prior density 7r(cg) whose hyper-parameters, Sq = 0.052 and vq — 4.02, 
were derived by applying the R LIMMA package (Smyth, 2005) eBayes function 
to the sample variances. fj, g are distinct independent "fixed" effects, that are 
elicited flat non-informative priors, Tr n i(fi g ) oc 1. For assessing the FDR of the 
selection rules we use the eBayes prior 

TT(fi g ) = 8.5 • exp(-8.5 • |/x 9 |)/2, 

that provided a good fit to the empirical distribution of y% ■ • ■ 2/8448- Given \x g 
and a g , s 2 g the sample variances are independent 0^X3/3, and y g the observed 
mean log2 expression ratios are independent N(n g , <7g/4). Thus the marginal 
likelihood is given by 

f(y g ,s 2 g \fi g ,a 2 g ) oc a; 4 exp{-^[3 S 2 g + 4(^ g -y g ) 2 }}. (32) 

Our goal in the analysis is to specify a selection rule for which the mean 
directional error in declaring selected genes with y g > over-expressed and 
declaring selected genes with y g < under-expressed is less than 0.05, and to 
provide inference for the change in expression of selected genes. 

6.1 Specifying the selection rules 

In the first part of our analysis we use the level q = 0.10 BH procedure to discover 
differentially expressed genes; assess the directional-FDR of the selection rule 
specified by the BH procedure; compare its performance to the level q = 0.10 
Bayesian FDR controlling selection rule based on moderated t statistics and the 
most powerful level q = 0.10 Bayesian FDR controlling selection rule based, 
constructed according to the algorithm defined in 14.61 

LIMMA implements a hybrid classical/Bayes approach in which \i g are as- 
sumed to be unknown constants while a 2 are iid tt(<J 2 ). The moderated t statis- 
tic is defined t g = y g /(s g /2), for s 2 = {vqs 2 , + 3s 2 )/(v + 3) the posterior mean 
of a 2 \s 2 g . As § 2 g /a 2 - xl +z/{ v o + 3 )> (Vg ~ M 9 )/(V 2 ) are ( v o + 3) degree of 
freedom t random variables. Thus the p- values LIMMA provides to test the null 
hypotheses of non-differential expression are p g = 2 • (1 — F„ +3(\t g \)), where F v 
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is the v degree of freedom t cdf. Applied at level q = 0.10 to the 8448 p- values 
the BH procedure yielded 245 discoveries, corresponding to the rejection region 
\t g \ > 4.479. The observed mean log2 expression ratios and sample standard 
deviations of the 8448 genes are drawn in Figure 5. The BH discoveries are 
the 245 observations beneath the solid blue curve \t g \ = 4.479. To see why this 
rejection region corresponds to 0.05 directional FDR control notice that for all 
/j, g , the probability of a directional error is less than 1 — F Vo +% (4.479) ; thus 
12.08 = 8448 • (1 — F„ 0+ 3 (4.479)) is a conservative estimate for the number of 
false directional discoveries, and 0.049 = 12.08/245 is a conservative estimate 
for the directional FDR. 

For comparison, the frequentist treatment of this problem would be testing 
the null hypotheses of non-differential expression by 3 degree of freedom test 
statistics t g = y g j (s g /2). Since the 3 degree of freedom t-distribution has heavier 
tails, F 3 " 1 (l-0.1/(2-8448)) = 57.10 while max(\t g \) is only 27.90. Thus applying 
the level q = 0.1 BH topi • • -P8448, withp ff = 2-(l— F 3 (\t g \)), yields discoveries. 

In order to assess the directional FDR we derive the "random" 9 saBayes 
posterior distribution 

^">"*'> = Pr((^) £ W ' (33) 

for the eBayes prior distribution Tr(fj, g ,a g ) = 7f(p 9 ) • Tv(a g ). We then integrate 
out (T g in (I33[) to derive 7fs(/z s |y g , s g ) the marginal "random" 9 saBayes posterior 
distribution of and the "random" 9 posterior expected loss corresponding 
to directional errors 

P(Vg,4) = J ^ si 9 n (y~g)} ■ ^(PglVg^l )d[i g , 

and use it to numerically compute the "random" 9 saBayes risk corresponding 
to the directional FDR 

r S (S m arg) = £ ms ( Sg , s 2) (p{y g , S 2 g )), 

I ((yg, s2 g) £ S marg ) -jUj^of) ■ f(y g ,Sg\ Hg,(Jg) 

J ' 1 {{yg^ s l) £ S marg ) ■ n(fj, g ,aj) ■ f(y g ,s g \ p,g,a g )dp g da g ' 



for 

msiVgiSg) 



rs for \t g \ > 4.479 the q = 0.10 BH procedure (solid blue curve in Figure 5) 
is 0.024. While \t g \ > 2.64 (dashed blue curve in Figure 5) is the moderated 
t selection rule with fg = 0.05. It yields 1124 discoveries. The green curves 
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in Figure 5 correspond to the selection rules p(y gi s 2 g ) < s. The solid curve 
corresponds to the selection rule with s — 0.05, that yields 559 discoveries. 
The dashed curve corresponds to the selection rule with s — 0.088, for which 
fs = 0.05. This is the selection rule that yields the maximal expected number 
of discoveries of all selection rules with rs = 0.05. In this case it yields 1271 
discoveries. 

6.2 Providing saBayes inference 

In the second part of our analysis we provide saBayes inference for ^6239, the 
expected log2-fold change in expression due to the swirl mutation for Gene 
number 6239. Note that in the hybrid classical/Bayes approach it is not clear 
how to apply the Benjamini and Yekutieli (2005) frequentist FCR adjustment. 
The statistics for this gene (marked by the red plus sign in Figure 5) are j/6239 = 
-0.435 and s| 239 = 0.0173 thus ? 623 9 = -4.51. 

The marginal posterior distributions of He239 are drawn in Figure 6. The 
black curve corresponds to the non-informative prior unadjusted posterior 

K{VgiV 2 g\y g ,S g ) OC n ni (fi g ) -TT^) • f(y g ,S g \ (J,g,(Tg), 

for which (/i6239 — 2/6239)/(s6239/2) ~ £7.02- The posterior mean and mode equal 
j/6239 = —0.435, the 0.95 credible interval for /Lt6239 is [—0.61, —0.21], the poste- 
rior probability that ^6239 > an d a directional error is committed is 0.0014. 
The green curve corresponds to 7rs(M6239 1^6239, S6239 )■ Its posterior mode is 
—0.36, the posterior mean is —0.31, the 0.95 credible interval is [—0.54, —0.01], 
and the posterior probability that ^239 > is 0.020. 

As fig is elicited a non-informative prior and a g is a "random" effect, then 
the selection-adjusted posterior distribution of (fj, g ,a g ) is proportional to the 
joint truncated distribution in ([5]), with p g substituting the "fixed" A and a 2 
substituting the "random" 9, 

^s{^g^ 2 g \Vg,s g ) oc 7r(Cg) • n ni (fi g ) ■ f(y g , s g \ fj, g ,a g )/ Pr(|f fl | > a \ fi g ). (34) 

SaBayes inference for He239 is based on irs{[i g \y g , s g ), the marginal selection 
adjusted posterior of ^239, derived by integrating out a g from (|34[) . The solid 
blue curve is ir s (n g \y g7 s g ) for the selection rule |t 9 | > 4.479. Its posterior 
mode is —0.278, the posterior mean is —0.257, the 0.95 credible interval is 
[—0.54,0.02], and the posterior probability that /Z6239 > 0, and thus the Gene 
was erroneously declared under-expressed, is 0.038. The dashed blue curve 
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corresponds to \t g \ > 2.64. In this case the shrinking towards is weaker: 
the posterior mode is —0.419, the posterior mean is —0.367, the 0.95 credible 
interval is [—0.63, —0.02], and the posterior probability that ^6239 > is 0.017. 

7 Discussion 

We have shown that selective inference adds an arbitrary element to Bayesian 
analysis. However it is important to note that the selection rule is determined 
before the data is observed, and once the selection rule is determined the entire 
process of providing saBayes inference is fully specified and is carried out the 
same way as Bayesian inference. The notable exception is eBayes methods in 
which the data is used twice in the analysis, first to elicit the prior distribu- 
tion and possibly to specify the selection rule, and then to produce posterior 
distributions. 

Our method of controlling the Bayesian FDR corresponds to the fixed rejec- 
tion region approach presented in Yekutieli and Bcnjamini (1999), that consists 
of estimating the FDR in a series of nested fixed rejection regions and choosing 
the largest rejection region with estimated FDR less than q. However, as the 
pFDR of any selection rule S ma r g can be expressed as a saBayes risk, the prob- 
lem of controlling the Bayesian FDR in the random effect and non-exchangeable 
random effect models is reduced into a Bayesian decision problem of finding the 
"optimal" selection rule with fs < q. Our Bayesian FDR controlling methods 
can, in principle, provide tight FDR control for any discovery event A marg (yi). 
Whereas frequentist FDR controlling methods may provide tight FDR control 
when the discovery is rejecting a simple null hypothesis, but as illustrated by the 
performance of the BH procedure in controlling the directional-FDR, can only 
bound the FDR when the discoveries are rejecting composite null hypotheses. 

In general, the price paid by using stricter selection rules is reduction in the 
information the data provides for selective inference. Example 13 .31 suggests that 
when specifying selection rules, in addition to the tradeoff between allowing too 
many false (or wasteful) discoveries and failing to make enough discoveries, it 
may also be advisable to take into account the quality of the inference provided 
for selected parameters. 

Lastly, even though we discussed selection rules that control the FDR in- 
curred when selecting a subset of parameters and used either non-informative 
priors or random-effect priors. Our main result, that Bayesian inference for 
"fixed" and "mixed" effects must be corrected for selection, also applies when 
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the prior distribution is elicited according to prior knowledge and regardless of 
why selection is applied. 
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Observed Y 



Figure 1: Simulated example - scatter plot of \Yi\ > 3.111 components. Yi 
values are drawn on the abscissa of the plot, the ordinates are 6i values. The 
red lines are marginal 0.95 CIs. The green lines are 0.05 FCR-adjusted CIs. 
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Random" effect 



"Mixed" effect 



Fixed" effect 




Figure 2: Simulated example - scatter plot of Y\ > 3.111 realizations of (f?i, Yi) 
in the " random" effect truncated sampling model (left panel - 466 observations) , 
the " mixed" effect truncated sampling model (middle panel - 498 observations) , 
and the "fixed" effect truncated sampling model (right panel - 501 observations). 
The solid blue curves are the selection-adjusted 0.95 posterior credible intervals 
for 9i, and the dashed blue curves are the selection-adjusted posterior means. 
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Effect size of theta 12647 



Effect size of theta 90543 



Figure 3: Simulated example - saBayes posterior distributions. The Posterior 
distributions for 012647 are drawn in the left panel, the Posterior distributions for 
$90543 are drawn in the right panel. The black curves are unadjusted posteriors; 
the blue curves are "random" effect model saBayes posteriors; the green curves 
are non-informative prior saBayes posteriors. 
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Figure 4: Simulated example - scatter plot of Yi > 3.111 components. The 
dashed green and red lines are the CIs from Figure 1. The blue curves are the 
"random" effect model saBayes 0.95 credible intervals. The light-blue curves 
are the non-informative prior saBayes 0.95 credible intervals. 
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sample mean 



Figure 5: Swirl data - scatter plot of sample means and standard deviations. 
The abscissa of the plot is y g , the ordinates are s g . The solid blue curve is 
\t g \ = 4.479. The dashed blue curve is \t g \ = 2.64. The solid green curve is 
PiVgi s g) = 0.05. The dashed green curve is p(y g ,s g ) = 0.088. The red plus sign 

IS (^6239,56239)- 
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Figure 6: Swirl data - marginal posterior densities of fJ-6239- The black 
curve is the non informative prior unadjusted posterior distribution. The 
green curve is the eBayes prior posterior distribution. The solid blue curve 
is the non-informative prior saBayes posterior distribution for the selection rule 
\t g \ > 4.479. The dashed blue is the non-informative prior saBayes posterior 
distribution for the selection rule|£ fl | > 2.64. 
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